Adjustment of garbage collection parameters in a storage system

ABSTRACT

A system, method, and machine-readable storage medium for performing garbage collection in a distributed storage system are provided. In some embodiments, an efficiency level of a garbage collection process is monitored. The garbage collection process may include removal of one or more data blocks of a set of data blocks that is referenced by a set of content identifiers. The set of slice services and the set of data blocks may reside in a cluster, and a set of filters may indicate whether the set of data blocks is in-use. At least one parameter of a filter of the set of filters may be adjusted (e.g., increased or reduced) if the efficiency level is below the efficiency threshold. Garbage collection may be performed on the set of data blocks in accordance with the set of filters.

TECHNICAL FIELD

The present description relates to data storage systems, and morespecifically, to a system, method, and machine-readable storage mediumfor improving system operation by improving the performance of garbagecollection to recover storage segments as free space.

BACKGROUND

A plurality of storage nodes organized as a cluster may provide adistributed storage architecture configured to service storage requestsissued by one or more clients of the cluster. The storage requests aredirected to data stored on storage devices coupled to one or more of thestorage nodes of the cluster. The data served by the storage nodes maybe distributed across multiple storage units embodied as persistentstorage devices, such as hard disk drives, solid state drives, flashmemory systems, or other storage devices. The storage nodes maylogically organize the data stored on the devices as volumes accessibleas logical units. Each volume may be implemented as a set of datastructures, such as data blocks that store data for the volume andmetadata blocks that describe the data of the volume. For example, themetadata may describe, e.g., identify, storage locations on the devicesfor the data. The data of each volume may be divided into data blocks.The data blocks may be distributed in a content driven manner throughoutthe nodes of the cluster.

A client may write data to, read data from, and/or delete data stored inthe distributed storage system. Data may be deleted from the system whena client address at which the data is stored is overwritten with otherdata or when a client address becomes invalid (e.g., a file or object isdeleted). A garbage collection process may remove data that is no longerin use from the distributed storage system. There is not a one-to-onemapping, however, between the client addresses and stored data blocksbecause multiple client addresses may have the same data blockreferenced by the same block identifier. It may be desirable for thesystem to only delete data that is no longer needed. For example, a datablock should not be deleted if it is being referenced by another clientaddress.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is best understood from the following detaileddescription when read with the accompanying figures.

FIG. 1 illustrates a system for a distributed data storage systemaccording to one or more aspects of the present disclosure.

FIG. 2 illustrates a more detailed example of data storage in the systemaccording to one or more aspects of the present disclosure.

FIG. 3 illustrates a system including a cluster of storage nodes coupledto a content manager that performs garbage collection according to oneor more aspects of the present disclosure.

FIG. 4 illustrates an example bloom filter in accordance with one ormore aspects of the present disclosure.

FIG. 5 illustrates a flow diagram of a method of performing garbagecollection using one or more bloom filters in a distributed data storagesystem according to one or more aspects of the present disclosure.

FIG. 6 illustrates a flow diagram of a method of determining afalse-positive rate according to one or more aspects of the presentdisclosure.

FIG. 7 illustrates a flow diagram of a method of adjusting thefalse-positive rate for the set of slice services based on a hardwareconstraint according to one or more aspects of the present disclosure.

FIG. 8 illustrates a flow diagram of a method of performing garbagecollection in a distributed data storage system according to one or moreaspects of the present disclosure.

FIG. 9 illustrates example efficiency sets in accordance with one ormore aspects of the present disclosure.

DETAILED DESCRIPTION

All examples and illustrative references are non-limiting and should notbe used to limit the claims to specific implementations and embodimentsdescribed herein and their equivalents. For simplicity, referencenumbers may be repeated between various examples. This repetition is forclarity only and does not dictate a relationship between the respectiveembodiments, unless noted otherwise. Finally, in view of thisdisclosure, particular features described in relation to one aspect orembodiment may be applied to other disclosed aspects or embodiments ofthe disclosure, even though not specifically shown in the drawings ordescribed in the text.

A distributed storage system may include one or more storage nodes, andeach storage node may include one or more slice services. In the presentdisclosure, “slice service” may be used interchangeably with “metadataservice”. A slice service may refer to metadata for a volume of dataand/or references to data blocks that compose the volume. Each sliceservice may include one or more volumes, and a client may store data tomultiple volumes, retrieve data from multiple volumes, and/or modifydata stored on multiple volumes. A client may write data to, read datafrom, and/or delete data stored in the distributed storage system. Agarbage collection process may remove data that is no longer in use fromthe distributed storage system. A slice service may transmit a set ofone or more bloom filters to a block service to indicate to the blockservice whether a data block referenced by the block service is in-useor not in-use. A bloom filter is a type of bit field that may be usedfor membership testing. For example, a bloom filter may indicate whethera set of data blocks is in-use or is not in-use. In a bloom filter, adata block that is in-use is marked as in-use, and a data block that isnot in-use is marked either as in-use or as not in-use. Accordingly, useof the bloom filter may provide for a false positive, but not a falsenegative. The block service receives the set of bloom filters andperforms garbage collection based on the set of bloom filters. Forexample, the block service may remove data blocks that are indicated asnot in-use and may leave data blocks that are indicated as in-use. Thebloom filter may have various parameters, such as a size of the bloomfilter (e.g., a total number of bits in the bloom filter), a targetfullness of the bloom filter (e.g., a total number of bits that is setto 1), and a number of hash functions used when constructing the bloomfilter.

As more data is stored in the distributed storage system and/or clientshave a greater number of nodes on a single cluster, it may be desirableto dynamically monitor and dynamically control the garbage collectionprocess to ensure that an efficiency level of the garbage collectionprocess does not fall below an efficiency threshold. The efficiencythreshold may have an inverse relationship with the false-positive rate.For example, it may be desirable to ensure that the false-positive ratelevel of the garbage collection process does not rise above thefalse-positive rate threshold. The false-positive rate and theefficiency level may add up to one-hundred percent.

The present disclosure provides a content manager that monitors anefficiency level of a garbage collection process, which may include oneor more rounds of garbage collection. The content manager maydynamically monitor the efficiency level using a bloom filter or anefficiency set for each round of garbage collection and may determine,for the round of garbage collection, a false-positive rate for a set ofslice services based on monitoring the efficiency level. For example,the content manager may monitor the efficiency of a round of garbagecollection by using one of several methods, which may include theefficiency sets to determine the false-positive rate, or by analyzingthe Bloom filters themselves, either by assuming they are independent orby applying them to a random set of data. Each bloom filter may have afalse-positive rate, and the content manager may determine the overallfalse-positive rate for a round of garbage collection based on thefalse-positive rates of the individual bloom filters. The overallfalse-positive rate for a round of garbage collection may be higher thanthe false-positive rate of each of the individual bloom filters.

In some examples, the content manager may control, based on theefficiency level, the garbage collection process by dynamicallyadjusting one or more of the bloom filters (e.g., reducing or increasinga size of the filter or a target fullness of the filter), with eachround of garbage collection. For example, for one or more rounds ofgarbage collection, the content manager may adjust one or more of thefilter parameters. For example, it may be desirable for the contentmanager to adjust one or more filter parameters if the content managerdetects that the efficiency level is below an efficiency threshold,potentially resulting in a higher efficiency garbage collection.Trade-offs may exist between improving the false-positive rate andsystem performance. For example, increasing the filter size may resultin a penalty in terms of memory consumption (e.g., consumes more memory)and communication (e.g., uses higher bandwidth). In some examples, itmay be desirable for the content manager to adjust one or more filterparameters if the content manager detects heavy load on the system(e.g., memory and/or network constraints), potentially resulting in alower efficiency garbage collection.

Aspects of the present disclosure can provide several benefits. Forexample, aspects may provide for reducing the false-positive rate ofdata that has been marked as in-use, but is actually not in-use.Accordingly, a large amount of data that would have remained on thesystem and falsely identified as in-use may be recycled and used forother purposes. Aspects may also provide for a higher efficiency garbagecollection process that results in fewer rounds of garbage collectionbecause more data is identified per round of garbage collection comparedto other garbage collection processes. Aspects may also provide forbalancing trade-offs between system performance and garbage collection,allowing for a more robust system performance and a better userexperience.

FIG. 1 illustrates a system 100 for a distributed data storage systemaccording to one or more aspects of the present disclosure. The system100 includes a client layer 102, a metadata layer 104, and a blockserver layer 106. The client layer 102 includes clients 108 ₁ and 108 ₂in the illustrated example. The metadata layer 104 includes metadataservers 110 ₁, 110 ₂, and 110 ₃ in the illustrated example. The blockserver layer 106 includes block servers 112 ₁, 112 ₂, 112 ₃, and 112 ₄in the illustrated example. Although the client layer 102 is shown asincluding two clients 108, the metadata layer 104 is shown as includingthree metadata servers 110, and the block server layer 106 is shown asincluding four block servers 112, these examples are not intended to belimiting and in other examples, the client layer 102, the metadata layer104, and the block server layer 106 may include any number (one or more)of clients 108, metadata servers 110, and block servers 112,respectively.

Although the parts of system 100 are shown as being logically separate,entities may be combined in different fashions. For example, thefunctions of any of the layers may be combined into a single process orsingle machine (e.g., a computing device) and multiple functions or allfunctions may exist on one machine or across multiple machines. Whenoperating across multiple machines, the machines may communicate using anetwork interface, such as a local area network (LAN) or a wide areanetwork (WAN). In some embodiments, one or more metadata servers 110 maybe combined with one or more block servers 112 in a single machine.Entities in the system 100 may be virtualized entities. For example,multiple virtual block servers 112 may be included on a machine.Entities may also be included in a cluster, where computing resources ofthe cluster are virtualized such that the computing resources appear asa single entity.

The clients 108 include client processes that may exist on one or morephysical machines. When the term “client 108” is used in the presentdisclosure, the action being performed may be performed by a clientprocess. A client process is responsible for storing, retrieving, and/ordeleting data in the system 100. A client process may address pieces ofdata depending on the nature of the storage system and the format of thedata stored. For example, the client process may reference data using aclient address, which may take different forms. For example, in astorage system that uses file storage, the client 108 may reference aparticular volume or partition, and a file name. For object storage, theclient address may be a unique object name. For block storage, theclient address may be a volume or partition, and a block address. Theclients 108 may communicate with the metadata layer 104 using differentprotocols, such as small computer system interface (SCSI), Internetsmall computer system interface (ISCSI), fibre channel (FC), commonInternet file system (CIFS), network file system (NFS), hypertexttransfer protocol (HTTP), web-based distributed authoring and versioning(WebDAV), or a custom protocol.

The block servers 112 store data for clients 108. In some embodiments,data may be broken up into one or more storage units. A storage unit mayalso be referred to as a data block. Data may be segmented into datablocks. Data blocks may be of a fixed size, may be initially a fixedsize but compressed, or may be of a variable size. Data blocks may alsobe segmented based on the contextual content of the block. For example,data of a particular type may have a larger data block size compared toother types of data. Maintaining segmentation of the blocks on a write(and corresponding re-assembly on a read) may occur in the client layer102 and/or the metadata layer 104. Also, compression may occur in theclient layer 102, the metadata layer 104, and/or the block server layer106.

In some examples, data may be stored in a volume that is referenced bythe client 108. A volume may be made up of one or more volume slices.The data associated with the volume includes a list of volume slices forthat volume. A volume slice is a list of blocks for a portion of avolume. A block is the raw data for a volume and may be the smallestaddressable unit of data.

The block servers 112 may store data on a storage medium. The storagemedium may include different medium formats. For example,electromechanical disk storage or a solid state storage drive may beused. Electromechanical disk storage may include spinning disks that usemovable read/write heads to read/write to/from different locations ofthe spinning disks. Inserting the read/write head at various randomlocations results in slower data access than if data is read from asequential location. A solid state storage drive uses a solid statememory to store persistent data. Solid state drives may use microchipsthat store data in non-volatile memory chips and may contain no movingparts. Solid state drives may also perform random access and parallelreads/writes efficiently.

Data from the clients may be stored non-sequentially. In variousimplementations, non-sequentially storing data in storage is based uponbreaking data up into one more data blocks. In addition to storing datanon-sequentially, data blocks can be stored to achieve substantiallyeven distribution across the storage system. In various examples, evendistribution can be based upon a unique block identifier. For example,the data blocks may be stored in the block server layer 106 based onunique block identifiers. A block identifier may also be referred to asa content identifier and may be used interchangeably in the presentdisclosure.

A block identifier can be an identifier that is determined based on thecontent of the data block, such as by a hash of the content (e.g., acryptographic hash function (e.g., Skein algorithm) that generates ahash value identified herein as the “block identifier”). The blockidentifier is unique to that block of data. For example, blocks with thesame content have the same block identifier, but blocks with differentcontent have different block identifiers. The values of possible uniqueidentifiers can have a uniform distribution. The bin assignments may bestored in a distributed key-value store across a cluster (e.g., acluster 302 in FIG. 3) (e.g., in a so-called “zookeeper” database asjust one example). Accordingly, storing data blocks based upon theunique identifier, or a portion of the unique identifier, results in thedata being stored substantially evenly across drives in the cluster.Because client data, e.g., a volume associated with the client, isspread evenly across all of the drives in the cluster, every drive inthe cluster may be involved in the read and write paths of each volume.This configuration may balance the data and load across all of thedrives. Such an arrangement may remove hot spots within the cluster,which can occur when the client's data is stored sequentially on anyvolume.

In addition, having data spread evenly across drives in the clusterallows a consistent total aggregate performance of a cluster to bedefined and achieved. This aggregation can be achieved, since data foreach client is spread evenly through the drives. Accordingly, a client'sI/O will involve all the drives in the cluster. Because clients havetheir data spread substantially evenly through all the drives in thestorage system, a performance of the system can be described inaggregate as a single number, e.g., the sum of performance of all thedrives in the storage system.

The block servers 112 maintain a mapping between a block identifier andthe location of the data block in a storage medium of block server 112.Data blocks with the same block identifiers are not stored multipletimes on a block server 112 when received in multiple client writerequests.

The metadata layer 104 may store metadata that maps between the clientlayer 102 and the block server layer 106. For example, metadata servers110 may map between the client addressing used by the clients 108 (e.g.,file names, object names, block numbers, etc.) and block layeraddressing (e.g., block identifiers) used in the block server layer 106.The clients 108 may perform access based on client addresses, and blockservers 112 may store data based on unique block identifiers for thedata.

FIG. 2 illustrates a more detailed example of data storage in the system100 according to one or more aspects of the present disclosure. A client108 ₁ and a client 108 ₂ may wish to read data from and/or write data tothe distributed data storage system. For example, client 108 ₁ may wishto write data to a volume at a client address 1. The client address 1may include a target name of the volume and a list of block numbers(e.g., logical block addresses, “LBAs”). The data that client 108 ₁wishes to write may include data blocks A F, K, and Length (e.g., thecontent to be written).

The client 108 ₂ may wish to write data at a client address 2. Forexample, client address 2 may reference a different volume than clientaddress 1 and a different list of block numbers. Other formats of clientaddressing may also be used. For discussion purposes, the client address1 and client address 2 may be used to reference the respective datablocks and block numbers (e.g., LBAs). The data that client 108 ₂ wishesto write may include data blocks F, K, B, and A. Accordingly, datablocks A, F, and K are duplicates between the data that the client 108 ₁and the client 108 ₂ wish to write.

The metadata layer 104 may include a metadata server 110 ₁ and ametadata server 110 ₂. Different metadata servers may be associated withdifferent client addresses. For example, different metadata servers 110may manage different volumes of data. In this example, metadata server110 ₁ is designated as handling client address 1, and metadata server110 ₂ is designated as handling client address 2.

For each client address, a list of block numbers may be stored. Theblock numbers may represent data blocks associated with the clientaddress. For example, for client address 1, the block identifiers (alsoreferred to as “block IDs” herein) of data blocks A, F, K, and L arestored and associated with client address 1. Each block identifier isassociated with a block of data (e.g., block ID A is associated with thedata block A, block ID B is associated with the data block B, etc.).Similarly, in metadata server 110 ₂, the client address 2 is associatedwith block IDs of data blocks F, K, B, and A (e.g., block ID A for datablock A, etc.).

The block server layer 106 includes block servers 112 ₁, 112 ₂, and 112₃. In an example, the block servers 112 are assigned to different rangesof block identifiers. For example, block server 112 ₁ may be assigned tostore data for block identifiers A-E, block server 112 ₂ may be assignedto store data for block identifiers F-J, and block server 112 ₃ may beassigned to store data for block identifiers K-O. In this example, datafor a client address may not be stored in sequential locations on astorage medium in a single block server 112. Rather, the data may bestored based on the block identifier determined from the content of thedata.

The block server 112 ₁ stores data for block identifier A and blockidentifier B. Thus, in the example of FIG. 2 the block server 112 ₁stores data blocks A and B, based on the corresponding block identifiersA and B. Additionally, the block server 112 ₁ may maintain a mappingbetween the block identifier and the location on the storage mediumwhere the data associated with block identifier A is stored. Forexample, block identifier A may be mapped to a location 1 where data forblock identifier A (e.g., data block A) is stored on block server 112 ₁and block identifier B may be mapped to a location 2 where data forblock identifier B (e.g., data block B) is stored on block server 112 ₁.Additionally, block server 112 ₂ stores data for block identifier F inlocation 2 on block server 112 ₂, and block server 112 ₃ stores data forblock identifiers K and L in locations 4 and 5, respectively, in blockserver 112 ₃.

In some examples, the data blocks for a client address are not stored insequential locations on a storage medium 114. For example, for clientaddress 1, data block A may be stored on block server 112 ₁ in storagemedium 114 ₁, data block F is stored on block server 112 ₂ in storagemedium 114 ₂, and data blocks K and L are stored on block server 112 ₃in storage medium 114 ₃. In some examples, the storage medium 114 inblock server 112 may be a solid state device, such as non-volatilememory (e.g., flash memory). The solid state device may be electricallyprogrammed and erased. The data blocks may be stored on the solid statedevice and persist when block server 112 is powered off. Solid statedevices allow random access to data in an efficient manner and includeno physical moving parts. For example, the random access is moreefficient using solid state devices than if a spinning disk is used.Thus, data stored in data blocks for a client address in anon-contiguous address space and even different block servers 112 maystill be accessed efficiently. In some examples, the storage medium 114may include multiple solid state drives (e.g., flash memory drives).Each drive may store data (e.g., data blocks) for a portion of the blockidentifiers. Although a solid state device is described, it will beunderstood that spinning disks may also be used with particularembodiments.

Due to the use of the metadata server 110 to abstract the clientaddresses, block servers 112 may be unaware of which clients 108 arereferencing which data blocks. Accordingly, it may be undesirable toallow block servers 112 to remove an overwritten or deleted blockbecause block servers 112 do not know if other clients 108 are usingthis data block. Metadata server 110 has information on which datablocks are in-use by clients 108 and may communicate with block servers112 to provide information which data blocks are in-use and which arenot in-use. A data block is “in-use” if the data block is currentlyreferenced by a client 108 and is “not in-use” if the data block is notreferenced by any clients 108.

Garbage collection may refer to an algorithm that is periodically run toidentify data that is no longer in-use and then delete this data. Thepresent disclosure provides techniques for performing garbage collectionin an efficient manner in system 100.

FIG. 3 illustrates a system 300 including a cluster 302 of storage nodes303 coupled to a content manager 320 that performs garbage collectionaccording to one or more aspects of the present disclosure. The cluster302 includes a plurality of storage nodes 303, and each storage node 303may include one or more slice services 306 as well as one or more blockservices 309. One or more volumes 308 may be maintained by a sliceservice 306.

A client 314 may correspond to the client 108, the slice services 306may correspond to the metadata server 110, and the block service 309 maycorrespond to the block server 112 illustrated in FIG. 1. The client 314may store data to, retrieve data from, and/or modify data stored at thecluster 302. Each client 314 may be associated with a volume. In someexamples, only one client 314 accesses data in a volume. In someexamples, multiple clients 314 may access data in a single volume. Theslice services 306 and/or the client 314 may break data into datablocks, such as discussed above with respect to FIGS. 1 and 2. Blockservices 309 and slice services 306 may maintain mappings between theclient's address and the eventual physical location of the data block inrespective storage media of one or more storage nodes 106. A volumeincludes these unique and uniformly random identifiers, and so avolume's data is also evenly distributed throughout the cluster.

The slice services 306 may store metadata that maps between clients 314and block services 309. For example, slice services 306 may map betweenthe client addressing used by clients 314 (e.g., file names, objectnames, block numbers, etc. such as LBAs) and block layer addressing(e.g., block identifiers) used in block services 309. Further, blockservices 309 may map between the block layer addressing (e.g., blockidentifiers) and the physical location of the data block on one or morestorage devices. The blocks may be organized within bins maintained bythe block services 309 for storage on physical storage devices (e.g.,SSDs). A bin may be derived from the block ID for storage of acorresponding data block by extracting a predefined number of bits fromthe block identifiers introduced above.

In some embodiments, the bin may be divided into buckets or “sublists”by extending the predefined number of bits extracted from the blockidentifier. A bin identifier may be used to identify a bin within thesystem. The bin identifier may also be used to identify a particularblock service 309 (e.g., block service 309 ₁-209 _(n) in the example ofFIG. 3) and associated storage device (e.g., SSD). A sublist identifiermay identify a sublist with the bin, which may be used to facilitatenetwork transfer (or syncing) of data among block services in the eventof a failure or crash of a storage node. Accordingly, a client canaccess data using a client address, which is eventually translated intothe corresponding unique identifiers that reference the client's data atthe storage node 303.

The above structure allows storing of data evenly across the cluster ofstorage devices (e.g., SSDs). For each volume hosted by a slice service306, a list of block identifiers may be stored with one block identifierfor each logical block on the volume. Each volume may be replicatedbetween one or more slice services 306 and/or storage nodes 303, and theslice services for each volume may be synchronized between each of theslice services hosting that volume. Accordingly, failover protection isprovided in case a slice service 306 fails, such that access to eachvolume may continue during the failure condition.

Although parts of the system 300 are shown as being logically separate,entities may be combined in different ways. For example, functionsdiscussed in the present disclosure may be combined into a singleprocess or single machine (e.g., a computing device) and multiplefunctions or all functions may exist on one machine or across multiplemachines. Additionally, when operating across multiple machines, themachines may communicate using a network interface, such as a local areanetwork (LAN) or a wide area network (WAN). In some implementations,slice services 306 may be combined with one or more block services 309in a single machine. Additionally or alternatively, entities in system300 may be virtualized entities. Entities may also be included in thecluster 302, where computing resources of the cluster are virtualizedsuch that the computing resources appear as a single entity.

In the example illustrated in FIG. 3, the cluster 302 may include astorage node 303 ₁ including a slice service 306 ₁, a storage node 303 ₂including a slice service 306 ₂, and include a storage node 303 _(n)including a slice service 306 _(n). The slice service 306 ₁ includesvolumes 308 ₁ and 308 ₂, the slice service 306 ₂ includes volume 308 ₃,the slice service 306 _(n) includes volumes 308 ₄ and 308 ₅. These aremerely examples, and it should be understood that a storage node 303 mayinclude any number of slice services (e.g., one or more slice services),and a slice service may include any number of volumes (e.g., one or morevolumes).

Each slice service 306 may have a respective storage operating system(OS) 310. Moreover, one of the storage OS 310 may operate as a clustermanager to other slice services 306 within the cluster. Should thatslice service 306 with the storage OS 310 operating as the clustermanager fail, another storage OS 310 may assume that role in its place.The storage OS 310 may track data usage per volume, per client. A clientmay access multiple volumes, and multiple clients may access the samevolume. The storage OS 310 may store the usage information per client,per volume into a metadata datastore (which may be within main memory ofa storage node 303, for example, or a storage device such as an SSDassociated with a slice service 306 as another example).

As discussed above, client 314 may write data to, read data from, and/ordelete data stored in the system 300. Data may be deleted from thesystem 300 when a client address at which the data is stored isoverwritten with other data or when a client address becomes invalid(e.g., a file or object is deleted). There is not a one-to-one mappingbetween the client addresses and stored data blocks because multipleclient addresses may have the same data block referenced by the sameblock identifier. It may be desirable for the system 300 to only deletedata that is no longer needed. For example, a data block should not bedeleted if it is being referenced by another client address.

Due to the use of the metadata server 110 to abstract the clientaddresses, block servers 112 may be unaware of which clients 108 arereferencing which data blocks. Accordingly, it may be undesirable toallow block servers 112 to remove an overwritten or deleted blockbecause block servers 112 do not know if other clients 108 are usingthis data block. Metadata server 110 has information on which datablocks are in-use by clients 108 and may communicate with block servers112 to provide information which data blocks are in-use and which arenot in-use. A data block is “in-use” if the data block is currentlyreferenced by a client 108 and is “not in-use” if the data block is notreferenced by any clients 108.

Garbage collection may refer to an algorithm that is periodically run toidentify data that is no longer in-use and then delete this data. Thecontent manager 320 may perform garbage collection on objects stored inthe system 100, 300 by tracking which data blocks are in-use and/or arenot in-use and accordingly deleting the data blocks that are not in-use.The content manager 320 includes a monitoring module 322, an adjustmentmodule 324, and a garbage collector 326. The monitoring module 322 maymonitor the cluster 302 and mark which data blocks are in-use (andtherefore should be not deleted). The adjustment module 324 may adjustparameters related to the garbage collection to ease the workload on thecluster 302. The garbage collector 326 may delete the data blocks thatare not in-use. As will be discussed further below, the content manager320 may perform one or more rounds of garbage collection on the cluster302.

Aspects of the content manager 320 (e.g., the monitoring module 322, theadjustment module 324, and/or the garbage collector 326) may beincorporated into the cluster 302 (e.g., storage node 303) or in thestorage OS 310. In the present disclosure, reference to the contentmanager 320 performing an action (e.g., receiving, adjusting,performing, reducing, increasing, monitoring, transmitting, determining,storing, etc.) may refer to the storage OS 310 and/or the cluster 302(e.g., or one or more components within the cluster 302 such as thestorage node(s) 303, the slice service(s) 306, the volume(s) 308, theblock service(s) 309, and the like) performing such action.

In some examples, the content manager 320 may monitor one or more bloomfilters and perform, based on the one or more bloom filters, garbagecollection on the cluster 302. A bloom filter is a type of bit fieldthat may be used for membership testing. A bloom filter is a compactrepresentation of a set of data that can be used to later test for thepresence of individual elements. For example, the elements A, B, C, andD may be represented in a bloom filter. A block server 309 can testwhether any of the elements are in the bloom filter. However, the bloomfilter may not be used to generate the list of elements A, B, C, and D.In exchange for the reduction in size, a small possibility of an errormay be introduced. For example, a small percentage chance exists that anelement may appear to be present when it is in fact not. This chance oferror may be controlled by, for example, selecting a size for the bloomfilter based on a number of possible elements that can be stored onblock service 309 and/or selecting a particular target fullness of thefilter. Additionally, an error may not be fatal because the result ofthe error is that an element will just not be deleted when it isactually “not in-use”. Accordingly, an error in which a data block isdeleted when it is still being referenced by client 108 does not occur.

The slice service 306 may construct a single bloom filter including allmetadata on the slice service 306 for all corresponding block services309 or may construct multiple bloom filters for subsets of metadata onthe slice service 306 for each block service 309. The more metadata thatis encompassed by the bloom filter, the larger the bloom filter is,which may consume more memory and more network bandwidth to transmit. Ifthe slice service 306 constructs multiple bloom filters (e.g., one bloomfilter for each block service 309 or multiple bloom filters for eachblock service 309), then the slice service 306 may construct the bloomfilters serially or in parallel. If the slice service 306 constructsbloom filters in parallel, more memory may be consumed, but the numberof times metadata is read to build the bloom filters is reduced.Similarly, if the slice service 306 combines bloom filters beforeprocessing on block service 309, this may allow for fewer passes throughthe list of data blocks on block service 309, but may use larger bloomfilters and more memory.

The slice services 306 may transmit one or more bloom filters to one ormore block services 309, each bloom filter indicating to a block service309 whether a particular data block referenced by the block service 309is in-use (e.g., referenced by any clients 314). To construct a givenbloom filter, for example, the slice services 306 may construct a bloomfilter and initialize all entries in the bloom filter to 0. If the sliceservice 306 determines that a data block is in-use, the slice service306 marks a set of entries in the bloom filter corresponding to the datablock to a value of 1. In a bloom filter, a data block that is in-use ismarked as in-use, and a data block that is not in-use is marked eitheras in-use or as not in-use. Accordingly, use of the bloom filter mayprovide for a false positive, but not a false negative. Each bloomfilter may have a false-positive rate, as will be discussed in moredetail in the present disclosure.

FIG. 4 illustrates an example bloom filter 400 in accordance with one ormore aspects of the present disclosure. The bloom filter 400 may havevarious parameters, such as a size of the bloom filter (e.g., a totalnumber of bits in the bloom filter), a target fullness of the bloomfilter (e.g., a total number of bits that is set to 1), and a number ofhash functions used when constructing the bloom filter.

In FIG. 4, “m” in equation (1) below may represent the number ofentries. The entries may include block IDs for which “k” hash functionsare computed, where “k” is provided in equation (1) above. Accordingly,the number “n” (in equation (1) below of bits in the filter may be thenumber of columns (e.g., five), the number “m” of entries may be thenumber of rows (e.g., three) above the final row (which represents thefilled Bloom filter), and the number “k” of hashes may be the maximumnumber of bits in a row representing a single entry set to 1 (in thiscase, two; the first row has only one bit set to 1 indicating that twohash functions for that entry yielded the same bit).

The bloom filter 400 includes a plurality of entries (e.g., block IDs)used to indicate whether particular data blocks are in-use or are notin-use. The bloom filter 400 includes a bit field with twenty bits (asjust one example of a bloom filter), each bit having a value of 0 or 1.Although the bloom filter 400 is composed of twenty bits, other bloomfilters may have more than or fewer than twenty bits. In other examples,a bloom filter may have many more bits (e.g., 130,000 bits as just oneexample). The bloom filter 400 may include bit values indicating whichdata blocks are in-use in one or more block services 309, with a 1indicating that the corresponding data block is in-use and a 0indicating that the data block is not in-use. As discussed above, thebloom filter 400 may have a false-positive rate referring to a datablock that is marked as being is use when it is not actually in-use.

One or more entries in the bloom filter 400 may correspond to a contentidentifier that identifies an underlying data block referenced by ablock service 309. For example, a content identifier may map to kentries (e.g., three entries) in the bloom filter 400 through kdifferent hashing functions. In an example, the monitoring module 322may use k hashes per content identifier and accordingly may use k hashfunctions when constructing the bloom filter 400, where k is a numbergreater than zero (e.g., one, two, three, four, or more). For example,the monitoring module 322 may scan the metadata including a set ofcontent identifiers, and identify a particular content identifier of theset that identifies a data block.

If k is three, the particular content identifier may be an input intoeach of the three hash functions, with each result of the hash functionmapping to an entry in the bloom filter 400. For example, a first hashfunction may map the particular content identifier to an entry 402 ofthe bloom filter 400, a second hash function may map the particularcontent identifier to an entry 406 of the bloom filter 400, and a thirdhash function may map the particular content identifier to an entry 404of the bloom filter 400. The monitoring module 322 may accordingly setthe entries 402, 404, and 406 to a value of 1, thereby indicating thatthe data block identified by the particular content identifier isin-use. A data block may be considered to be in-use if all entries inthe bloom filter 400 corresponding to the content identifier is set to avalue of 1. The monitoring module 322 may continue to scan the sliceservices 306 and identify additional content identifiers for filling thebloom filter 400. For example, if the monitoring module 322 identifiesanother content identifier in the slice service 306, the monitoringmodule 322 may set another three bits (if N=3) in the bloom filter to 1until a target fullness of the filter is satisfied.

A fullness of a filter refers to a percentage of bits in the bloomfilter that is set to a value of 1. For example, the bloom filter 400has a fullness of forty-five percent because nine bits out of the twentybits are set to a value of 1 in the bloom filter 400. The slice service306 may determine whether a bloom filter satisfies a target fullness. Abloom filter satisfies the target fullness if, for example, the bloomfilter's fullness is greater than the target fullness. The targetfullness may be for, example, forty percent (as just one example of anumeric value). If a bloom filter 400 satisfies the target fullness, theslice service 306 may transmit the bloom filter to the block service 309assigned to store data associated with the slice service 306.

The monitoring module 322 may continue to scan the cluster 302 for sliceservices 306, construct a new bloom filter for the slice service 306,and set values in the bloom filter as discussed in the presentdisclosure. In some examples, the block service 309 receives a bloomfilter corresponding to every slice service 306 in the cluster 302.After receiving a bloom filter corresponding to every slice service 306in the cluster 302, the block service 309 may mark the in-use datablocks in accordance with the received bloom filter(s).

If a large number of data blocks is marked as being in-use when theyactually are not, this may lead to a high false-positive rate. A highfalse-positive rate may result in large amounts of unnecessary(overwritten or deleted) data remaining on a system when overwrites areoccurring between rounds of garbage collection, sometimes far in excessof even the in-use data on the system. For example, if the bloom filter400 has all bits set to 1, then all the data blocks corresponding to thebloom filter are indicated as being in-use and the false-positive rateis one hundred percent. In this scenario, the garbage collector 326would not delete any of these data blocks. In some examples, garbagecollection using bloom filters may scale badly with the size and numberof nodes in the cluster 302, without direct modification of bloom filterparameters by the user.

If the false-positive rate is high, the garbage collector 326 may have avery low efficiency rate when performing garbage collection in thesystem because those data blocks that are incorrectly marked as beingin-use (when they are not) will not be deleted from the system. Thepresent disclosure provides techniques for reducing the false-positiverate for a round of garbage collection, potentially resulting in ahigher efficiency garbage collection process.

FIG. 4 will be discussed relative to FIG. 5 (and vice versa) to betterunderstand concepts related to performing garbage collection based oninformation in one or more bloom filters. FIG. 5 illustrates a flowdiagram of a method 500 of performing garbage collection using one ormore bloom filters in a distributed data storage system according to oneor more aspects of the present disclosure. Blocks of the method 500 canbe executed by a computing device (e.g., a processor, processingcircuit, and/or other suitable component, such as of a storage node303). For example, the slice service 306 and/or the content manager 320(e.g., one or more components, such as the monitoring module 322, theadjustment module 324, and/or the garbage collector 326) may execute oneor more blocks of the method 500. As illustrated, the method 500includes a number of enumerated blocks, but embodiments of the method500 may include additional blocks before, after, and in between theenumerated blocks. In some embodiments, one or more of the enumeratedblocks may be omitted or performed in a different order.

At block 502, the method 500 includes monitoring a set of filterscorresponding to a set of slice services, the set of slice servicesresiding in a cluster and including a set of content identifiers, andthe set of filters indicating whether a set of data blocks referenced bythe set of content identifiers is in-use. In an example, the monitoringmodule 322 may monitor the efficiency of a garbage collection processusing one or more bloom filters. The monitoring module 322 may monitorthe set of filters transmitted from the set of slice services 306 to theset of block services 309 referencing the set of data blocks.

A content identifier may be, for example, a block identifier. The filtermay be a bloom filter. Although a bloom filter is discussed, otherfilters may be used. For example, in other examples, cache filters orextensions to bloom filters (e.g., counting bloom filters, distributedbloom filters, bloomier filters, parallel partitioned bloom filters,stable bloom filters, scalable bloom filters, spatial bloom filters,layered bloom filters, and/or attenuated bloom filters) may be used.

In some examples, the set of slice services 306 may scan the volumeshosted by the set of slice services and construct, based on the scan,bloom filters indicating which data blocks corresponding to the set ofslice services are in-use. A data block corresponds to a slice serviceif the slice service includes a content identifier that identifies thedata block.

At block 504, the method 500 includes determining a false-positive ratefor the set of slice services. In some examples, the monitoring module322 may receive the false-positive rate for the set of slice servicesfrom, for example, the slice service 306. In some examples, themonitoring module 322 may determine the false-positive rate for the setof slice services by calculating a probability of false positives for atleast one filter of the set of filters. In this example, for each filterof the set of filters, the monitoring module 322 may determine theprobability that a bit is not set to 1, which may be calculated inaccordance with equation (1):

$\begin{matrix}{{Q = ( {1 - {1/n}} )^{km}},{{which}\mspace{14mu}{is}\mspace{14mu}{equivalent}\mspace{14mu}{to}\mspace{14mu} e^{({- \frac{km}{n}})}}} & (1)\end{matrix}$where Q represents the probability that a bit is not set to 1 in thefilter, n represents the size of the filter array, m represents thenumber of entries (e.g., block IDs) in the filter, and k represents thenumber of hash functions corresponding to a content identifier (e.g.,k>=1). Referring to equation (1), km actions of setting a bit takeplace, in a pseudo-random fashion in accordance with the k hashfunctions form entries into the filter, each having a (1-1/n)probability of not setting a particular bit. Additionally, theadjustment module 324 may adjust the filter parameters to achieve atarget fullness for individual filters or an overall round of garbagecollection. The target fullness may be equal to the value of Q (e.g., asmeasured directly by counting the bits that are set; equation (1)amounts to a statistical prediction of this value).

The monitoring module 322 may use equation (1) to determine thefalse-positive rate for a filter (of the set of filters), which may becalculated in accordance with equation (2):P=(1−e ^((−km/n)))^(k)  (2)where P represents the false-positive rate of a filter, n represents thesize of the bloom filter array, m represents the number of entries(e.g., block IDs) in the filter, and k represents the number of hashfunctions corresponding to a content identifier.

The relationship between equations (1) and (2) may be expressed inaccordance with equation (3):P=(1−Q)^(k), which is equivalent to (1−e ^((−km/n)))^(k)  (3)where the variables in equation (3) are defined in accordance withequations (1) and (2) above.

In some examples, the monitoring module 322 determines thefalse-positive rate for each filter of the set of filters and combinesthe false-positive rates to determine a worst-case false-positive ratefor the set of slice services constructing the filters. In someexamples, the monitoring module 322 may determine the worst-casefalse-positive rate for the cluster overall. A worst-case false-positiverate for the set of slice services may include calculating afalse-positive rate for each filter independently, with an assumptionthat no filter in the set of filters store related data that wouldaffect the false-positive rate for another filter in the set of filters.

At block 506, the method 500 includes determining whether thefalse-positive rate satisfies a performance threshold. The performancethreshold may indicate a point at which, for example, the efficiency ofthe garbage collection process is too low or the cluster has a largeamount of data stored and it may be desirable to remove the data that isnot in-use. The monitoring module 322 may determine whether thefalse-positive rate satisfies the performance threshold. In someexamples, the false-positive rate satisfies the performance threshold ifthe false-positive rate exceeds the performance threshold. Theperformance threshold may be, for example, seventy-five percent (as justone example of a numeric value) for a metadata service. The performancethreshold may be, for example, ninety percent (as just one example of anumeric value) for the overall cluster.

In an example, the monitoring module 322 determines that thefalse-positive rate satisfies the performance threshold and in response,transmits an alert to the monitoring module 322. In this example, themonitoring module 322 may determine, based on receiving the alert, thatthe false-positive rate satisfies the performance threshold (e.g., thefalse-positive rate exceeds the performance threshold). The performancethreshold may be considered a number or percentage that is high enoughto warrant adjusting at least one of the filter parameters in order toreduce the false-positive rate. A trade-off may exist between reducingthe false-positive rate of one or more filters and the systemperformance, as will be discussed further below in, for example, aspectsof FIG. 7. For example, as will be discussed further below, increasing afilter size may reduce the false-positive rate, but may consume morememory, potentially resulting in degraded system performance.

If the false-positive rate for the set of slice services satisfies theperformance threshold, then it may be desirable to reduce thefalse-positive rate of the set of slice services to improve garbagecollection efficiency. In this instance, the method 500 may proceed toblock 508. In contrast, if the false-positive rate for the set of sliceservices does not satisfy the performance threshold, then it may beunnecessary to adjust the false-positive rate of the set of sliceservices because garbage collection may occur with an acceptable degreeof efficiency. In this instance, the method 500 may proceed to block510.

The false-positive rate for the set of slice services may be based on afalse-positive rate for each filter of the set of filters correspondingto the set of slice services. Each filter may independently have its ownintrinsic false-positive rate. If the adjustment module 324 reduces afalse-positive rate of a filter corresponding to the set of sliceservices, the false-positive rate of the set of slice services is alsoreduced. Similarly, if the adjustment module 324 increases afalse-positive rate of a filter corresponding to the set of sliceservices, the false-positive rate of the set of slice services is alsoincreased.

At block 508, the method 500 includes reducing the false-positive rateof at least one filter of the set of filters. In an example, theadjustment module 324 reduces the false-positive rate of a filter byadjust one or more parameters of the filter. A filter may have a set ofparameters including, for example, a size of the filter, a targetfullness of the filter, and/or a number of hashes per content identifierused for constructing the filter.

In some examples, the adjustment module 324 may reduce thefalse-positive rate of the filter by increasing the size of the filter(e.g., increasing the number of bits stored in the filter, identified bythe parameter “n” in the equation (1) above). If the adjustment module324 increases the filter size, then the number of filters transmittedfrom the set of slice services to the block services may be reduced. Forexample, if the adjustment module 324 increases the filter size by apower of two (e.g., from 130,000 bits to 260,000 as an example), thenthe number of filters transmitted from the set of slice services to theblock services may be cut in half. Although increasing the filter sizemay improve the false-positive rate for the set of slice services,trade-offs exist between improving the false-positive rate and systemperformance. For example, increasing the filter size may result in apenalty in terms of memory consumption (e.g., consumes more memory) andcommunication (e.g., uses higher bandwidth).

In some examples, the adjustment module 324 may reduce thefalse-positive rate of a filter by decreasing the target fullness of thefilter. If the adjustment module 324 decreases the target fullness, thenthe filter may reach its target fullness quicker. For example, if theslice service 306 is barely filling up a filter with 1s, it may bedesirable to decrease the target fullness of the filter to transmitfilters more efficiently. A lower target fullness may correspond to afewer number of slice services 306 and/or volumes hosted on the sliceservices 306. In some examples, the adjustment module 324 may reduce thefalse-positive rate of a filter by decreasing the size of the filter,while leaving the target fullness alone.

In some examples, the adjustment module 324 may reduce thefalse-positive rate of the filter by decreasing the number of hashes(e.g., identified by the parameter “k” in equation (1) above) percontent identifier used for constructing the filter. If the adjustmentmodule 324 decreases the number of hashes per content identifier usedfor constructing the filter, then fewer entries in the filter are usedfor any particular data block identified by the content identifier.

At block 510, the method 500 includes updating the set of filters inaccordance with the reduced false-positive rate. The false-positive ratefor the updated set of filters may be the reduced false-positive rate.Additionally, the updated set of filters may be in accordance with theadjusted set of filter parameters discussed in relation to block 508.For example, if the adjustment module 324 decreased the target fullnessto reduce the false-positive rate of a filter in relation to block 508,then the filter has a target fullness of the updated target fullness. Ifthe adjustment module 324 increased the size of a filter to reduce thefalse-positive rate of the filter in relation to block 508, then thefilter has a size of the updated filter size. If the adjustment module324 decreased the number of hashes per content identifier used forconstructing the filter to reduce the false-positive rate of the filterin relation to block 508, then the number of hashes per contentidentifier used for constructing the filter is updated in accordancewith the adjustment.

At block 512, the method 500 includes determining whether there isanother set of slice services in the cluster to process. It may bedesirable to process each of the slice services in the cluster andmonitor the filters corresponding to each of these slice services toensure that the underlying data blocks are not referenced by any of theclients in the cluster. If there is another set of slice services in thecluster to process, then the method 500 may proceed back to block 502.In contrast, if there is not another set of slice services in thecluster to process, then the method 500 may proceed to block 514.

At block 514, the method 500 includes performing garbage collection onthe one or more sets of data blocks in accordance with the one or moresets of filters. The sets of filters in block 514 may refer to the setof monitored filters in block 502 or the updated set of filters in block510. In an example, in a first round of garbage collection, the garbagecollector 326 performs garbage collection on the one or more sets ofdata blocks. During a round garbage collection, the garbage collector326 may remove data blocks that have been indicated as not in-use by thefilter and leave those data blocks that have been indicated as in-use bythe filter. After the first round of garbage collection, the garbagecollector 326 may perform another round of garbage collection if themethod 500 proceeds back to block 504, in which a false-positive ratefor the set of filters may be determined. In this example, thefalse-positive rate may have been reduced in accordance with executionof the block 508.

In some examples, the monitoring module 322 may set garbage collectionparameters at a single-node level or single slice-service level. Themonitoring module 322 may set the garbage collection parameters in theabsence of a previous round of garbage collection (e.g., at block 502)or if, for example, the metadata (e.g., list of content identifiers) hasbeen rebalanced on the system following metadata drive addition orremoval, or other conditions that may reduce the reliability of theprevious round of garbage collection. The garbage collection parametersmay be the parameters for a filter (e.g., size of the filter, targetfullness of the filter, and/or the number of hashes per contentidentifier used for constructing the filter).

FIG. 6 illustrates a flow diagram of a method 600 of determining afalse-positive rate according to one or more aspects of the presentdisclosure. Blocks of the method 600 can be executed by a computingdevice (e.g., a processor, processing circuit, and/or other suitablecomponent, such as of a storage node 303). For example, the sliceservice 306 and/or the content manager 320 (e.g., one or morecomponents, such as the monitoring module 322, the adjustment module324, and/or the garbage collector 326) may execute one or more blocks ofthe method 600. As illustrated, the method 600 includes a number ofenumerated blocks, but embodiments of the method 600 may includeadditional blocks before, after, and in between the enumerated blocks.In some embodiments, one or more of the enumerated blocks may be omittedor performed in a different order.

The monitoring module 322 may determine a false-positive rate inaccordance with the method 600 if, for example, it is difficult todetermine the false-positive rate corresponding to a slice service. Itmay be difficult to determine the false-positive rate corresponding to aslice service if, for example, the slice service 306 has just started upand not enough dynamic information on the volumes hosted on the sliceservice 306 or the slice services 306 have been rebalanced.

At block 602, the method 600 includes determining a size of a set ofmetadata files including a set of content identifiers corresponding to aset of slice services, the set of content identifiers identifying a setof data blocks. In an example, the monitoring module 322 determines thesize of the set of metadata files, which may correspond to the metadatalayer 104 in FIGS. 1 and 2 and/or the slice services 306 in FIG. 3.

At block 604, the method 600 includes estimating, based on the size, anumber of data blocks that are in-use for the set of slice services. Inan example, the monitoring module 322 estimates the number of datablocks referenced by a metadata service based on the size of themetadata files.

At block 606, the method 600 includes estimating, based on the number ofdata blocks, a false-positive rate for the set of slice services. In anexample, the monitoring module 322 may use the estimated number of datablocks to estimate a false-positive rate for the set of slice services.The monitoring module 322 may determine a number of filters will betransmitted based on the estimated number of data blocks, each filterhaving at least one garbage collection parameter. A garbage collectionparameter may also be referred to as a filter parameter in the presentdisclosure. A garbage collection parameter may include, for example, asize of the filter, a target fullness of the filter, and/or a number ofhashes per content identifier.

At block 608, the method 600 includes adjusting, based on thefalse-positive rate, one or more garbage collection parametersassociated with the slice services. At block 610, the method 600includes determining a target false-positive rate for the set of sliceservices based on the one or more adjusted garbage collectionparameters. In an example, the monitoring module 322 may adjust agarbage collection parameter by adjusting a size of the filter, a targetfullness of the filter, and/or a number of hashes per content identifierto yield the target false-positive rate determined at block 610. Thetarget false-positive rate may be a worst-case false-positive rate(e.g., five percent) for the set of slice services.

For example, after the block 610, the process flow may proceed from theblock 610 to the block 506 and so on, as discussed in relation to FIG.5. In this example, the target false-positive rate that is determined atblock 610 may be or correspond to the false-positive rate that isdetermined for the set of filters in block 504 in FIG. 5. For example,when executing block 506, the monitoring module 322 may determinewhether the target false-positive rate determined in block 610 satisfiesthe performance threshold.

In some examples, the monitoring module 322 may use other conditions toadjust the false-positive rate for the set of slice services. Forexample, the monitoring module 322 may determine, based on whether thecluster is under stress, whether to adjust the false-positive rate.

Although bloom filters and their parameters are discussed in the presentdisclosure, other garbage collection parameters may be adjusted. Forexample, other components that identify no false negatives, butpotentially false positives, may be used to determine the false-positiverate, as discussed in the present disclosure.

FIG. 7 illustrates a flow diagram of a method 700 of adjusting thefalse-positive rate for the set of slice services based on a hardwareconstraint according to one or more aspects of the present disclosure.Blocks of the method 700 can be executed by a computing device (e.g., aprocessor, processing circuit, and/or other suitable component, such asof a storage node 303). For example, the slice service 306 and/or thecontent manager 320 (e.g., one or more components, such as themonitoring module 322, the adjustment module 324, and/or the garbagecollector 326) may execute one or more blocks of the method 700.

At block 702, the method 700 includes determining a false-positive ratefor a set of slice services, the set of slice services residing in acluster and including a set of content identifiers, and a set of filtersindicating whether a set of data blocks referenced by the set of contentidentifiers is in-use. In an example, the monitoring module 322 maydetermine the false-positive rate for the set of slice services.

At block 704, the method 700 includes determining whether the set ofslice services is under a hardware constraint. In an example, themonitoring module 322 determines whether the set of slice services isunder a hardware constraint. The hardware constraint indicates acondition that has been identified as being significant enough tosacrifice the efficiency of the garbage collection process in order toimprove system performance. The set of slice services may be under ahardware constraint if, for example, a central processing unit (CPU),memory, or disk usage satisfies a usage threshold and/or failure toaddress a quality of service (QoS) setting for volumes hosted on thenode is detected. In an example, the QoS setting may be a minimuminput/output operations per second (IOPS) setting. The minimum TOPSsetting for a volume may refer to a guaranteed number of TOPS at whichthe volume will perform.

In some examples, the usage threshold may be a percentage of theresources that are available. In an example, the resource includes acentral processing unit (CPU), with a usage threshold of ninety percent(as just one example of a numeric value). In another example, theresource includes a memory (e.g., random access memory (RAM)), with ausage threshold of ninety percent (as just one example of a numericvalue). In another example, the resource includes a persistent datastorage, with a usage threshold of seventy-five percent (as just oneexample of a numeric value).

In some examples, the monitoring module 322 may operate in accordancewith three critical stage levels (as just one example of a numericvalue), with each stage indicating an increment in severity. Forexample, the first critical stage level indicating a warning, the secondcritical stage level indicating an error, and the third critical stagelevel indicating a critical hardware constraint. At each of the levels,the monitoring module 322 may start restricting more I/O flow, with eachincremental level causing an increase in the aggressiveness. Themonitoring module 322 may also reduce or turn off some internalnon-critical operations. For example, when the monitoring module 322detects that a disk is at the third critical stage level, the monitoringmodule 322 may reject the client write and therefore cause writefailures. A disk may be considered at the third critical stage level ifthe usage of the disk it at, for example, eighty-five to ninety percent(as just one example of a numeric value). Before the disk reaches thethird critical stage (e.g., when the monitoring module 322 detects thatthe disk is at the first critical stage level or the second criticalstage level), however, the monitoring module 322 may adjust the one ormore filter parameters such that the garbage collection process runsmore aggressively to remove more data to avoid running out of space. Forexample, if the monitoring module 322 detects that the memory is at thefirst critical stage level, the monitoring module 322 may requestcomponents in the system to free up memory. In this example, the garbagecollector 326 may use smaller filter sizes. Although three criticalstage levels are discussed, it should be understood that the monitoringmodule 322 may implement any number of critical stage levels (e.g., one,three, four, or more).

If the set of slice services is not under a hardware constraint, then itmay be unnecessary to adjust the false-positive rate. In this instance,the method 700 may proceed to block 706. At block 706, the method 700includes performing garbage collection on the set of data blocks inaccordance with the set of filters. In an example, the garbage collector326 performs garbage collection on the set of data blocks in accordancewith the set of filters in block 702. For example, the garbage collector326 may remove those data blocks of the set of data blocks that are notmarked as in-use in the filter (e.g., not referenced by any client) andleave those data blocks of the set of data blocks that are marked asin-use in the filter.

In contrast, if the set of slice services is under a hardwareconstraint, then it may be desirable to increase the false-positive ratefor the set of slice services to avoid a conflict between the garbagecollection process and other storage processes. In this instance, themethod 700 may proceed to block 708.

At block 708, the method 700 includes increasing, based on the hardwareconstraint, the false-positive rate for the set of slice services. Thefalse-positive rate for the set of slice services may be based on afalse-positive rate for each filter of the set of filters correspondingto the set of slice services. If the adjustment module 324 reduces afalse-positive rate of a filter corresponding to the set of sliceservices, the false-positive rate of the set of slice services is alsoreduced. Similarly, if the adjustment module 324 increases afalse-positive rate of a filter corresponding to the set of sliceservices, the false-positive rate of the set of slice services is alsoincreased.

In an example in relation to block 708, the adjustment module 324 mayincrease, based on the hardware constraint, the false-positive rate forthe set of slice services by adjusting one or more parameters of afilter of the set of filters. In some examples, the adjustment module324 increases the false-positive rate of a filter by increasing thetarget fullness of the filter. If the adjustment module 324 increasesthe target fullness, then the filter may correspond to a greater numberof slice services 306 and/or volumes hosted on the slice services 306.Although the adjustment module 324 may end up transmitting fewer filtersto the block service 309 overall, the false-positive rate of the filterwill be increased. In some examples, the adjustment module 324 increasesthe false-positive rate of the filter by decreasing the size of thefilter (e.g., decreasing the number of bits stored in the filter). Insome examples, the adjustment module 324 increases the false-positiverate of the filter by increasing the number of hashes per contentidentifier used for constructing the filter. By increasing the number ofhashes, the number of entries in the filter corresponding to a blockservice 309 is also reduced.

A relationship may exist between the false-positive rate and thehardware resources. For example, increasing the false-positive rate forthe set of slice services may result in the garbage collection processrunning less efficiently, but may also reduce the allocation of randomaccess memory (RAM) for filters. The increased false-positive ratediscussed in relation to block 708 in FIG. 7 may correspond to thereduced false-positive rate of at least one filter discussed in relationto block 508 or the target false-positive rate discussed in relation toblock 610 in FIG. 6.

At block 710, the method 700 includes updating the set of filters inaccordance with the increased false-positive rate. In an example, theadjustment module 324 may update the set of filters in accordance withthe adjustment of the filter parameters, as discussed in relation toblock 708. After the set of filters is updated, the method 700 mayproceed to block 706.

As illustrated, the method 700 includes a number of enumerated blocks,but embodiments of the method 700 may include additional blocks before,after, and in between the enumerated blocks. In some embodiments, one ormore of the enumerated blocks may be omitted or performed in a differentorder. For example, after the set of filters is updated, the monitoringmodule 322 may determine whether the set of slice services is under ahardware constraint. If the monitoring module 322 determines that theset of slice services is no longer under a hardware constraint, theadjustment module 324 may return the false-positive rate for the set ofslice services, which was increased in the block 708, to the determinedfalse-positive rate, as discussed in relation to block 702. In this way,the adjustment module 324 may temporarily increase the false-positiverate for a particular metadata service in order to avoid a conflictbetween the garbage collection processes and other storage processes.Because the increase to the false-positive rate was triggered by thehardware constraint, the adjustment module 324 may return thefalse-positive rate back to its initial value in response to the set ofslice services no longer being under the hardware constraint.

In some examples, the content manager 320 may execute one or more blocksincluded in the methods 500, 600, and/or 700 in a sequential manner. Forexample, after the content manager 320 executes the block 510 of method500, the content manager 320 may execute the blocks 704, 708, and/or710, before reverting back to the method 500 at block 512. The contentmanager 320 may execute the method 700 and prioritize garbage collectionor back-off on how aggressive it is under certain conditions. Forexample, the content manager 320 may execute the method 700 if thefalse-positive rate does not satisfy the performance threshold (e.g., itis unnecessary to reduce the false-positive rate of the set of sliceservices), as discussed in relation to block 516 in FIG. 5. Accordingly,if the content manager 320 determines that there is a large amount ofdata that is not in-use (e.g., the client continues to write new dataand then delete it) and should be cleared out though the garbagecollection process, the content manager 320 may prioritize garbagecollection and its efficiency to clear the cluster. In this example, thecontent manager 320 may adjust the garbage collection parameters to bemore aggressive and efficient, which may come at the cost of systemperformance in other areas (e.g., memory consumption). If the contentmanager 320 determines that the false-positive rate does not satisfy theperformance threshold (e.g., indicates an acceptable efficiency levelfor the garbage collection process) and determines that the set of sliceservices is under a hardware constraint, the content manager 320 mayincrease the false-positive rate for the set of slice services. In thisexample, the false-positive rate may be low enough that thefalse-positive rate can be increased in order to avoid a conflictbetween the garbage collection process and other storage processes.

FIG. 8 illustrates a flow diagram of a method 800 of performing garbagecollection in a distributed data storage system according to one or moreaspects of the present disclosure. Blocks of the method 800 can beexecuted by a computing device (e.g., a processor, processing circuit,and/or other suitable component, such as of a storage node 303). Forexample, the slice service 306 and/or the content manager 320 (e.g., oneor more components, such as the monitoring module 322, the adjustmentmodule 324, and/or the garbage collector 326) may execute one or moreblocks of the method 800. As illustrated, the method 800 includes anumber of enumerated blocks, but embodiments of the method 800 mayinclude additional blocks before, after, and in between the enumeratedblocks. In some embodiments, one or more of the enumerated blocks may beomitted or performed in a different order.

At block 802, the method 800 includes monitoring an efficiency level ofa garbage collection process, the garbage collection process includingremoval of one or more data blocks of a set of data blocks that isreferenced by a set of content identifiers, a set of slice services andthe set of data blocks residing in a cluster, and a set of filtersindicating whether the set of data blocks is in-use. The monitoringmodule 322 may monitor the efficiency level of the garbage collectionprocess by, for example, using bloom filters (see, e.g., FIG. 5) orusing efficiency sets.

FIG. 9 illustrates example efficiency sets in accordance with one ormore aspects of the present disclosure. Efficiency sets 908 and 910 maybe generated from groups of block identifiers 902 and 904 of a first andsecond volume, respectively. For explanatory purposes, block identifiersare shown as 4-digit binary numbers. However, any of the blockidentifiers as described herein may be utilized.

Accordingly, in column A, the sets of block identifiers 902 and 904 eachincludes the block identifiers for the corresponding volume. In otherwords, the block identifiers of group 902 correspond to the data blocksof the first volume. The block identifiers of group 904 correspond tothe data blocks of the second volume. In column B, a bitmask 906 hasbeen applied to block identifier groups 902 and 904 such that the leastsignificant bit of the block identifiers is masked to become “0” (i.e.,a bitwise AND of the value “0” has been applied to the first digit ofeach identifier). The scope of the present disclosure is not limited toa particular type of mask to be applied to groups of block identifiers.For example, any one of the bits of the binary representation of a blockidentifier may be masked to be set on or off, and multiple bits may bemasked at the same time. In an illustrative implementation, the type ofmask to be applied may be selected based on the desired probability ofaccuracy (e.g., confidence) an administrator desires in a resultingefficiency set and in calculations using the efficiency set. Forexample, as more bits are masked, the effective level of precision ofthe bit identifiers is reduced, and the probability that the group ofmasked bit identifiers is an accurate representation of volume data isreduced.

After bitmask 906 has been applied, the resulting efficiency sets 908and 910 may be seen with reference to column C. As depicted, theapplication of bitmask 906 to block identifiers group 902 resulted in areduced set of masked identifiers as compared to the entire group ofblock identifiers. For example, block identifier “0111” of group 902 wasmasked to become “0110,” which was a duplicate entry in group 902,column B. As a result, the efficiency set 908 may store one instance of“0110”, and the duplicate entry of “0110” may be removed. Because ofthis, the memory footprint used to store efficiency set 908 is less thanthat of block identifier group 902, because fewer entries are stored(i.e., six entries in efficiency set 908 of volume 1 as compared toseven entries in block identifier group 902 for volume 1).

Additionally or alternatively, the bitmask 906 may also be applied toblock identifier group 904 (i.e. the block identifiers corresponding tothe second volume). As a result of applying the bitmask 906, efficiencyset 910 is formed for the second volume, which includes four entries, ascompared to the seven entries in group 904.

The content manager 320 may estimate, based on a number of entries in anefficiency set for a volume, an amount of data that is in-use for thevolume. For example, for each block identifier included in theefficiency set, the content manager 320 may determine an amount of datathat is referenced by the block identifier. To determine an amount ofdata that is in-use in the cluster, the content manager 320 may continueto perform these calculations for each volume hosted on a slice serviceand further for each slice service in the cluster.

In some examples, the content manager 320 may create each efficiency setby scanning all block IDs from a data volume, applying a membership test(e.g., applying the bitmask), and adding those block IDs that passed themembership test. The membership test may become stricter as the scanproceeds, applying retroactively to previously admitted block IDs. Thesize of the set of block IDs at the end of the scan may represent afraction of the total number of data blocks in the system referenced bythat volume (or, equivalently, by a collection of volumes, if that scanextends over multiple volumes), and that fraction may be determined bythe membership test (e.g., the bitmask). The content manager 320 maymultiply by the inverse of that fraction, yielding a statisticalestimate of the total number of unique data blocks referenced by thevolume (or set of volumes). The total number of unique data blocks maybe the estimated amount of data that is in-use for the volume, asdiscussed in relation to FIGS. 8 and 9.

Referring back to block 804 in FIG. 8, the method 800 includesdetermining whether an efficiency level is below an efficiencythreshold. If the efficiency level is not below the efficiencythreshold, then it may be unnecessary to adjust a parameter of a filterof the set of filters. In this instance, the method 800 may proceed toblock 808. In contrast, if the efficiency level is below the efficiencythreshold, then it may be desirable to adjust a parameter of a filter ofthe set of filters in order to improve the efficiency level of thegarbage collection process.

In some examples, if the monitoring module 322 monitors the efficiencylevel of the garbage collection process by, for example, using bloomfilters, the monitoring module 322 may determine that the efficiencylevel is below the efficiency threshold if the false-positive rate forthe set of slice services satisfies the performance threshold, asdiscussed in relation to block 506 in FIG. 5. The false-positive ratefor the set of slice services may satisfy the performance threshold if,for example, the false-positive rate exceeds the performance threshold.

In some examples, the monitoring module 322 monitors the efficiencylevel of the garbage collection process by, for example, usingefficiency sets. The monitoring module 322 may determine that theefficiency level is below the efficiency threshold if a differencebetween an amount of data stored by a set of block services (e.g., blockservices 309 in FIG. 3) corresponding to the data blocks and anestimated amount of data blocks in-use in the cluster is greater than acritical value. For example, the critical value may be ten percent ofthe overall cluster capacity. A block service corresponds to a datablock if the block service stores a mapping from a block identifier to alocation of the data block.

If the efficiency level is below the efficiency threshold, the method800 may proceed to block 806. At block 806, the method 800 includesadjusting at least one parameter of one or more filters of the set offilters. In an example, the adjustment module 324 may adjust a filterparameter to increase the efficiency level of the garbage collectionprocess. The adjustment module 324 may increase the efficiency level ofthe garbage collection process by reducing a false-positive rate of atleast one filter of the set of filters, as discussed in relation toaspects of FIG. 5.

At block 808, the method 800 includes performing garbage collection onthe set of data blocks in accordance with the set of filters. During around garbage collection, the garbage collector 326 may remove datablocks that have been indicated as not in-use by the filter and leavethose data blocks that have been indicated as in-use by the filter.

At block 810, the method 800 includes determining whether to performanother round of garbage collection. The garbage collector 326 maydetermine whether to perform another round of garbage collection. If thegarbage collector 326 determines to perform another round of garbagecollection, then the method 800 may proceed back to block 802. Incontrast, if the garbage collector 326 determines to not perform anotherround of garbage collection, then the method 800 may proceed to block812. At block 812, the method 800 ends.

The present embodiments can take the form of an entirely hardwareembodiment, an entirely software embodiment, or an embodiment containingboth hardware and software elements. Accordingly, it is understood thatany operation of the computing systems of computing architecture 100 maybe implemented by the respective computing system using correspondinginstructions stored on or in a non-transitory computer readable mediumaccessible by the processing system. For the purposes of thisdescription, a tangible computer-usable or computer-readable medium canbe any apparatus that can store the program for use by or in connectionwith the instruction execution system, apparatus, or device. The mediummay include non-volatile memory including magnetic storage, solid-statestorage, optical storage, cache memory, and RAM.

Thus, the present disclosure provides a system, method, andmachine-readable storage medium for performing garbage collection in adistributed storage system. In some embodiments, the method includesmonitoring an efficiency level of a garbage collection process, thegarbage collection process including removal of one or more data blocksof a set of data blocks that is referenced by a set of contentidentifiers, a set of slice services and the set of data blocks residingin a cluster, and a set of filters indicating whether the set of datablocks is in-use; determining whether the efficiency level is below anefficiency threshold; adjusting at least one parameter of the one ormore filters of the set of filters in response to determining that theefficiency level is below the efficiency threshold; and performinggarbage collection on the set of data blocks in accordance with the setof filters.

In some examples, monitoring the efficiency level includes monitoringthe set of filters. In some examples, monitoring the efficiency levelincludes determining an efficiency set for each volume hosted by a sliceservice of the set of slice services. In some examples, the set offilters includes a set of bloom filters. In some examples, at least oneparameter of the one or more filters includes a size of the filter, atarget fullness of the filter, and a number of hash functions used whenconstructing the filter. In some examples, determining whether theefficiency level is below the efficiency threshold includes determiningwhether a false-positive rate for the set of slice services exceeds aperformance threshold. In some examples, determining whether theefficiency level is below the efficiency threshold includes determiningwhether a difference between an amount of data stored by a set of blockservices corresponding to the set of data blocks and an estimated amountof data block in-use in the cluster is greater than a critical value. Insome examples, adjusting at least one parameter includes reducing afalse-positive rate of a filter of the set of filters. In some examples,monitoring the efficiency level, determining whether the efficiencylevel is below the efficiency threshold, adjusting at least oneparameter, and performing garbage collection occurs for each round ofgarbage collection. In some examples, adjusting a parameter of a filterof the set of filters is in response to determining that the set ofslice services is under a hardware constraint. In some examples,adjusting at least one parameter includes increasing a size of a filterof the one or more filters, decreasing a target fullness of the filter,or decreasing a number of hashes per content identifier used forconstructing the filter.

In yet further embodiments, the non-transitory machine-readable mediumhas instructions for performing garbage collection in a distributedstorage system, including machine executable code which when executed byat least one machine, causes the machine to: monitor a set of filterscorresponding to a set of slice services, the set of slice servicesresiding in a cluster and including a set of content identifiers, andthe set of filters indicating whether a set of data blocks referenced bythe set of content identifiers is in-use; determine whether afalse-positive rate for the set of slice services satisfies aperformance threshold; adjust the false-positive rate of at least onefilter of the set of filters in response to determining that thefalse-positive rate satisfies the performance threshold; and performinggarbage collection on the set of data blocks in accordance with the setof filters.

In some examples, the false-positive rate satisfies the performancethreshold if the false-positive rate exceeds the performance threshold.In some examples, the false-positive rate is based on the set offilters. In some examples, the non-transitory machine-readable mediumfurther includes code, which causes the machine to update the set offilters in accordance with the adjusted false-positive rate, where thegarbage collection is performed in accordance with the updated set offilters. In some examples, the non-transitory machine-readable mediumfurther includes code, which causes the machine to adjust thefalse-positive rate by reducing the false-positive rate of a filter ofthe set of filters.

In yet further embodiments, a computing device includes a memorycontaining a machine-readable medium comprising machine executable codehaving stored thereon instructions for performing garbage collection ina distributed storage system; and a processor coupled to the memory. Theprocessor is configured to execute the machine executable code to, foreach round of garbage collection: monitor an efficiency level of agarbage collection process for a set of data blocks referenced by a setof content identifiers, the set of data blocks residing in a cluster,and a set of filters indicating whether the set of data blocks isin-use; determine whether the efficiency level is below an efficiencythreshold; reduce a false-positive rate of a filter of the set offilters; and perform garbage collection on the set of data blocks inaccordance with the set of filters.

In some examples, the processor is configured to execute the machineexecutable code to reduce the false-positive rate of the filter byincreasing a size of the filter. In some examples, the processor isconfigured to execute the machine executable code to reduce thefalse-positive rate of the filter by decreasing a target fullness of thefilter. In some examples, the processor is configured to execute themachine executable code to reduce the false-positive rate of the filterby decreasing a number of hashes per content identifier used forconstructing the filter.

The foregoing outlines features of several embodiments so that thoseskilled in the art may better understand the aspects of the presentdisclosure. Those skilled in the art should appreciate that they mayreadily use the present disclosure as a basis for designing or modifyingother processes and structures for carrying out the same purposes and/orachieving the same advantages of the embodiments introduced herein.Those skilled in the art should also realize that such equivalentconstructions do not depart from the spirit and scope of the presentdisclosure, and that they may make various changes, substitutions, andalterations herein without departing from the spirit and scope of thepresent disclosure.

What is claimed is:
 1. A non-transitory machine-readable medium havingstored thereon instructions for performing garbage collection in adistributed storage system, comprising machine executable code whichwhen executed by at least one machine, causes the machine to: monitor aset of filters corresponding to a set of slice services, the set ofslice services residing in a cluster and including a set of contentidentifiers, and the set of filters indicating whether a set of datablocks referenced by the set of content identifiers is in-use; select anapproach of a plurality of approaches for estimating a false-positiverate for the set of slice services based on a state of the distributedstorage system; estimate the false-positive rate based on the selectedapproach; determine whether the estimated false-positive rate satisfiesa performance threshold; adjust a false-positive rate of at least onefilter of the set of filters in response to determining that theestimated false-positive rate satisfies the performance threshold; andperform garbage collection on the set of data blocks in accordance withthe set of filters.
 2. The non-transitory machine-readable medium ofclaim 1, wherein the estimated false-positive rate satisfies theperformance threshold if the false-positive rate exceeds the performancethreshold.
 3. The non-transitory machine-readable medium of claim 1,wherein the estimated false-positive rate is based on one or moreparameters of the set of filters.
 4. The non-transitory machine-readablemedium of claim 1, further comprising code, which causes the machine to:update the set of filters in accordance with the adjusted false-positiverate, wherein the garbage collection is performed in accordance with theupdated set of filters.
 5. The non-transitory machine-readable medium ofclaim 1, further comprising code, which causes the machine to: adjustthe false-positive rate by reducing the false-positive rate of a filterof the set of filters.
 6. The non-transitory machine-readable medium ofclaim 1, wherein the state of the distributed storage system relates toan amount of dynamic information on one or more volumes hosted by one ormore slice services of the set of slice services or the one or moreslice services having been rebalanced.
 7. The non-transitorymachine-readable medium of claim 1, wherein the set of filters comprisebloom filters, wherein the selected approach for estimating thefalse-positive rate for the set of slice services is based on one ormore parameters of the set of bloom filters, and wherein the estimatedfalse-positive rate represents a worst-case false-positive rate for theset of slice services.
 8. The non-transitory machine-readable medium ofclaim 1, wherein the selected approach for estimating the false-positiverate for the set of slice services involves use of a statisticalestimate of a total number of unique data blocks referenced by one ormore volumes hosted by one or more slice services of the set of sliceservices.
 9. The non-transitory machine-readable medium of claim 1,wherein the selected approach for estimating the false-positive rate forthe set of slice services involves use of efficiency sets determined forrespective volumes hosted by the set of slice services.
 10. Thenon-transitory machine-readable medium of claim 1, wherein the selectedapproach for estimating the false-positive rate for the set of sliceservices includes determining whether a difference between an amount ofdata stored by a set of block services corresponding to the set of datablocks and an estimated amount of data blocks in-use in the cluster isgreater than a critical value.
 11. A method comprising: monitoring a setof bloom filters corresponding to a set of slice services, the set ofslice services residing in a cluster and including a set of contentidentifiers, and the set of bloom filters indicating whether a set ofdata blocks referenced by the set of content identifiers is in-use;selecting an approach of a plurality of approaches for estimating afalse-positive rate for the set of slice services based on a state ofthe distributed storage system; estimating the false-positive rate basedon the selected approach; determining whether the estimatedfalse-positive rate satisfies a performance threshold; adjusting afalse-positive rate of at least one bloom filter of the set of bloomfilters in response to determining that the estimated false-positiverate satisfies the performance threshold; and performing garbagecollection on the set of data blocks in accordance with the set of bloomfilters.
 12. The method of claim 11, wherein the state of thedistributed storage system relates to an amount of dynamic informationon one or more volumes hosted by one or more slice services of the setof slice services or the one or more slice services having beenrebalanced.
 13. The method of claim 11, wherein the selected approachfor estimating the false-positive rate for the set of slice services isbased on one or more parameters of the set of bloom filters.
 14. Themethod of claim 11, wherein the selected approach for estimating thefalse-positive rate for the set of slice services involves use ofefficiency sets determined for respective volumes hosted by the set ofslice services.
 15. A computing device comprising: a memory containing amachine-readable medium comprising machine executable code having storedthereon instructions for performing garbage collection in a distributedstorage system; and a processor coupled to the memory, the processorconfigured to execute the machine executable code to: monitor a set offilters corresponding to a set of slice services, the set of sliceservices residing in a cluster and including a set of contentidentifiers, and the set of filters indicating whether a set of datablocks referenced by the set of content identifiers is in-use; select anapproach of a plurality of approaches for estimating a false-positiverate for the set of slice services based on a state of the distributedstorage system; estimate the false-positive rate based on the selectedapproach; determine whether the estimated false-positive rate satisfiesa performance threshold; adjust a false-positive rate of at least onefilter of the set of filters in response to determining that theestimated false-positive rate satisfies the performance threshold; andperform garbage collection on the set of data blocks in accordance withthe set of filters.
 16. The computing device of claim 15, wherein theprocessor is further configured to execute the machine executable codeto update the set of filters in accordance with the adjustedfalse-positive rate, wherein the garbage collection is performed inaccordance with the updated set of filters.
 17. The computing device ofclaim 15, wherein the state of the distributed storage system relates toan amount of dynamic information on one or more volumes hosted by one ormore slice services of the set of slice services or the one or moreslice services having been rebalanced.
 18. The computing device of claim15, wherein the set of filters comprise bloom filters and wherein theselected approach for estimating the false-positive rate for the set ofslice services is based on one or more parameters of the set of bloomfilters.
 19. The computing device of claim 15, wherein the selectedapproach for estimating the false-positive rate for the set of sliceservices involves use of efficiency sets determined for respectivevolumes hosted by the set of slice services.
 20. The computing device ofclaim 15, wherein the selected approach for estimating thefalse-positive rate for the set of slice services includes determiningwhether a difference between an amount of data stored by a set of blockservices corresponding to the set of data blocks and an estimated amountof data blocks in-use in the cluster is greater than a critical value.